{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Video Game Sales Prediction Weekend Hackathon 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The gaming industry is certainly one of the thriving industries of the modern age and one of those that are most influenced by the advancement in technology. \n",
    "- With the availability of technologies like AR/VR in consumer products like gaming consoles and even smartphones, the gaming sector shows great potential. \n",
    "- In this hackathon, you as a data scientist must use your analytical skills to predict the sales of video games depending on given factors. \n",
    "- Given are 8 distinguishing factors that can influence the sales of a video game. \n",
    "- Your objective as a data scientist is to build a machine learning model that can accurately predict the sales in millions of units for a given game."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings(action='ignore')\n",
    "\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "from sklearn.tree import DecisionTreeRegressor,ExtraTreeRegressor\n",
    "from sklearn.ensemble import RandomForestRegressor,ExtraTreesRegressor,\\\n",
    "                    GradientBoostingRegressor,BaggingRegressor,AdaBoostRegressor\n",
    "    \n",
    "import xgboost as xgb\n",
    "import lightgbm as lgb\n",
    "import catboost as cat\n",
    "\n",
    "from sklearn.model_selection import StratifiedKFold"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def metric(y_test, y_pred):\n",
    "    rmse = mean_squared_error(y_test, y_pred , squared=False)\n",
    "    return rmse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "train = pd.read_csv(\"data/Train.csv\")\n",
    "test = pd.read_csv(\"data/Test.csv\")\n",
    "sample = pd.read_csv(\"data/Sample_Submission.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtrain = train.copy()\n",
    "dtest = test.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((3506, 9), (1503, 8))"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.shape, test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>CONSOLE</th>\n",
       "      <th>YEAR</th>\n",
       "      <th>CATEGORY</th>\n",
       "      <th>PUBLISHER</th>\n",
       "      <th>RATING</th>\n",
       "      <th>CRITICS_POINTS</th>\n",
       "      <th>USER_POINTS</th>\n",
       "      <th>SalesInMillions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2860</td>\n",
       "      <td>ds</td>\n",
       "      <td>2008</td>\n",
       "      <td>role-playing</td>\n",
       "      <td>Nintendo</td>\n",
       "      <td>E</td>\n",
       "      <td>2.833333</td>\n",
       "      <td>0.303704</td>\n",
       "      <td>1.779257</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>731</td>\n",
       "      <td>wii</td>\n",
       "      <td>2012</td>\n",
       "      <td>simulation</td>\n",
       "      <td>Konami Digital Entertainment</td>\n",
       "      <td>E10+</td>\n",
       "      <td>13.200000</td>\n",
       "      <td>1.640000</td>\n",
       "      <td>0.215050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>495</td>\n",
       "      <td>pc</td>\n",
       "      <td>2019</td>\n",
       "      <td>shooter</td>\n",
       "      <td>Activision</td>\n",
       "      <td>M</td>\n",
       "      <td>4.562500</td>\n",
       "      <td>0.006410</td>\n",
       "      <td>0.534402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2641</td>\n",
       "      <td>ps2</td>\n",
       "      <td>2002</td>\n",
       "      <td>sports</td>\n",
       "      <td>Electronic Arts</td>\n",
       "      <td>E</td>\n",
       "      <td>4.181818</td>\n",
       "      <td>0.326923</td>\n",
       "      <td>1.383964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>811</td>\n",
       "      <td>ps3</td>\n",
       "      <td>2013</td>\n",
       "      <td>action</td>\n",
       "      <td>Activision</td>\n",
       "      <td>M</td>\n",
       "      <td>2.259259</td>\n",
       "      <td>0.032579</td>\n",
       "      <td>0.082671</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     ID CONSOLE  YEAR      CATEGORY                     PUBLISHER RATING  \\\n",
       "0  2860      ds  2008  role-playing                      Nintendo      E   \n",
       "1   731     wii  2012    simulation  Konami Digital Entertainment   E10+   \n",
       "2   495      pc  2019       shooter                    Activision      M   \n",
       "3  2641     ps2  2002        sports               Electronic Arts      E   \n",
       "4   811     ps3  2013        action                    Activision      M   \n",
       "\n",
       "   CRITICS_POINTS  USER_POINTS  SalesInMillions  \n",
       "0        2.833333     0.303704         1.779257  \n",
       "1       13.200000     1.640000         0.215050  \n",
       "2        4.562500     0.006410         0.534402  \n",
       "3        4.181818     0.326923         1.383964  \n",
       "4        2.259259     0.032579         0.082671  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ID                 2654\n",
       "CONSOLE              17\n",
       "YEAR                 23\n",
       "CATEGORY             12\n",
       "PUBLISHER           204\n",
       "RATING                6\n",
       "CRITICS_POINTS     1683\n",
       "USER_POINTS        2187\n",
       "SalesInMillions    3506\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>YEAR</th>\n",
       "      <th>CRITICS_POINTS</th>\n",
       "      <th>USER_POINTS</th>\n",
       "      <th>SalesInMillions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>count</td>\n",
       "      <td>3506.000000</td>\n",
       "      <td>3506.000000</td>\n",
       "      <td>3506.000000</td>\n",
       "      <td>3506.000000</td>\n",
       "      <td>3506.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>mean</td>\n",
       "      <td>2282.233600</td>\n",
       "      <td>2008.990302</td>\n",
       "      <td>3.790831</td>\n",
       "      <td>0.405824</td>\n",
       "      <td>2.171021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>std</td>\n",
       "      <td>1287.273422</td>\n",
       "      <td>4.304252</td>\n",
       "      <td>3.141781</td>\n",
       "      <td>0.455541</td>\n",
       "      <td>2.495396</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>min</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1997.000000</td>\n",
       "      <td>0.568966</td>\n",
       "      <td>0.000341</td>\n",
       "      <td>0.001524</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25%</td>\n",
       "      <td>1220.250000</td>\n",
       "      <td>2006.000000</td>\n",
       "      <td>1.738095</td>\n",
       "      <td>0.065966</td>\n",
       "      <td>0.965679</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>50%</td>\n",
       "      <td>2262.500000</td>\n",
       "      <td>2009.000000</td>\n",
       "      <td>2.766667</td>\n",
       "      <td>0.233333</td>\n",
       "      <td>1.866140</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>75%</td>\n",
       "      <td>3404.750000</td>\n",
       "      <td>2012.000000</td>\n",
       "      <td>4.621528</td>\n",
       "      <td>0.598333</td>\n",
       "      <td>2.792029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>max</td>\n",
       "      <td>4523.000000</td>\n",
       "      <td>2019.000000</td>\n",
       "      <td>23.250000</td>\n",
       "      <td>2.325000</td>\n",
       "      <td>84.226041</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                ID         YEAR  CRITICS_POINTS  USER_POINTS  SalesInMillions\n",
       "count  3506.000000  3506.000000     3506.000000  3506.000000      3506.000000\n",
       "mean   2282.233600  2008.990302        3.790831     0.405824         2.171021\n",
       "std    1287.273422     4.304252        3.141781     0.455541         2.495396\n",
       "min       1.000000  1997.000000        0.568966     0.000341         0.001524\n",
       "25%    1220.250000  2006.000000        1.738095     0.065966         0.965679\n",
       "50%    2262.500000  2009.000000        2.766667     0.233333         1.866140\n",
       "75%    3404.750000  2012.000000        4.621528     0.598333         2.792029\n",
       "max    4523.000000  2019.000000       23.250000     2.325000        84.226041"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7f542067f510>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEGCAYAAABrQF4qAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAea0lEQVR4nO3df7TcdX3n8edrZm5uQgJJSILEJJggEUVlQSJgxV9VLLYuuFs8ousu7rFL7SnHturZpesWXbp7jrb7w1pxC6usbleLP7fN6aZFVsCyWmjCD8EEIyEg3CRIQn4QSHLvnZn3/vH9TjIZZjLfuXdu7s1nXo9z7snM9/uduZ+ZM3nN576/n8/nq4jAzMzSVZruBpiZ2dRy0JuZJc5Bb2aWOAe9mVniHPRmZomrTHcDWi1evDhWrlw53c0wMzuh3HfffbsiYkm7fTMu6FeuXMmGDRumuxlmZicUST/vtM+lGzOzxDnozcwS56A3M0ucg97MLHEOejOzxDnozcwS56A3M0ucg97MLHEDE/Qf/NK9fP77j053M8zMjruBCfqfPv0cf/nAtuluhpnZcTcwQT9WrbN11wuM7Dkw3U0xMzuuBiboq/Xskon/79Fd09wSM7Pja2CCfrxWB+DupqDfse8gtbqvmWtmaRuIoI8IxmtZoP/wsV3U6sH9T+7hjZ+5g//z8I5pbp2Z2dQaiKBvlG1WnzaPvQfGeWhkL9f/1U+oBzz7/Og0t87MbGoNRNA3yjZvPTtbk/+67zzMT7Y9B8BotT5t7TIzOx4GJOizHv3p8+dwztJT2PyL/Vy46lQARscd9GaWtgEJ+izMh8riba9cQqUk/sN7XsNQWYxWa9PcOjOzqTXjLiU4Fap5j36oXOLat63mn75uOS9fMo/hStmlGzNLXtJB//V7nwRg9wtjAGx4Yg8R8IGLzgBguFJyj97MkjcQpZvGWPlySUdtH66UXKM3s+QNRtBHh6AfcunGzNI3GEHf6NGrTY/epRszS9xABH39WKUb9+jNLHEDEfTVjkFfdo3ezJI3EEHf8WTskEs3Zpa+gQj6eqeTsS7dmNkAGIig73wy1qNuzCx9gxX0bXv0Lt2YWdoKBb2kyyRtlrRF0nVt9n9M0iZJD0n6vqSXNe2rSXow/1nbz8YX1Qj6UsurHR7yhCkzS1/XJRAklYEbgUuBEWC9pLURsanpsAeANRFxQNJvAX8EvC/fdzAizutzu3vSmDBVyZO+sTTCE7sOsP9Q9fD9xtIIZmYpKdKjvxDYEhFbI2IMuBW4ovmAiLgzIhpX3b4HWN7fZk7O4R790ZUbKmVRrbtHb2ZpKxL0y4Cnmu6P5Ns6+TDwN033Z0vaIOkeSe9p9wBJ1+THbNi5c2eBJvWmU42+UipRrQURvm6smaWryOqVarOtbTJK+iCwBnhL0+YzImK7pDOBOyQ9HBGPHfVkETcDNwOsWbOm76nbCPpKS5F+qCyCrLRTUbuXaWZ24ivSox8BVjTdXw5sbz1I0juATwKXR8ThC7FGxPb8363AXcD5k2jvhHQ6GVvJe/iN9erNzFJUJOjXA6slrZI0C7gKOGr0jKTzgZvIQv6Zpu0LJQ3ntxcDbwSaT+IeF50mTFXK2ctvLJFgZpairqWbiKhKuha4DSgDt0TERkk3ABsiYi3wx8A84FvKSiBPRsTlwKuAmyTVyb5UPtMyWue4qHaYMHWkR+8TsmaWrkJXmIqIdcC6lm3XN91+R4fH/Qh47WQa2A+1elASqDXoGz16l27MLGEDMTO2Xo8XlW3gSI9+3EMszSxhAxH01Wgf9ENln4w1s/QNRNDX6/Gi+jz4ZKyZDYaBCPpal9KNT8aaWcoGO+jdozezATAYQd+hRn/4ZKx79GaWsMEI+npQalOjH3KP3swGwMAEfeWYNXoHvZmla2CCvtS2Rp8HvcfRm1nCBiPoO9boPTPWzNI3GEHfcRy9Z8aaWfoGJ+jb9OhLEmXJPXozS9pABH2ntW4gv5ygh1eaWcIGIuirxwr6kjy80sySNhBBX+9wMhay2bEu3ZhZygYi6DudjIWsR++TsWaWssEJ+g49+iH36M0scQMf9JWyPGHKzJI2GEEf7WfGQn4y1j16M0vYYAR9Pah0qtGXSx51Y2ZJG5igP3aP3qUbM0tX8kEfEdSDYw6vHHeP3swSlnzQ1yIL8Y6jbtyjN7PEpR/0eW+94zj6smfGmlnakg/6xsjJzksgeBy9maUt+aBvjJH3OHozG1TJB32jKtOtRx/hXr2ZpSn5oO9Wox8qi+DISVszs9QUCnpJl0naLGmLpOva7P+YpE2SHpL0fUkva9p3taRH85+r+9n4IrqWbnyBcDNLXNegl1QGbgTeBZwDvF/SOS2HPQCsiYhzgW8Df5Q/9lTgU8BFwIXApyQt7F/zu+t6MracXzfWI2/MLFFFevQXAlsiYmtEjAG3Alc0HxARd0bEgfzuPcDy/PavALdHxO6I2APcDlzWn6YXc7h007VH7xOyZpamIkG/DHiq6f5Ivq2TDwN/08tjJV0jaYOkDTt37izQpOK6TZg63KN36cbMElUk6NslZNtUlPRBYA3wx708NiJujog1EbFmyZIlBZpUXKNHXzrGhUcAX3zEzJJVJOhHgBVN95cD21sPkvQO4JPA5REx2stjp1K30s1Q2SdjzSxtRYJ+PbBa0ipJs4CrgLXNB0g6H7iJLOSfadp1G/BOSQvzk7DvzLcdN42gr/hkrJkNqEq3AyKiKulasoAuA7dExEZJNwAbImItWalmHvAtZSWSJyPi8ojYLekPyb4sAG6IiN1T8ko6OFy68clYMxtQXYMeICLWAetatl3fdPsdx3jsLcAtE23gZBU+GesevZklauBnxh4+GesevZklKvmgr3c9GesevZmlLfmgLz5hykFvZmlKPuirXWv0edB7HL2ZJSr5oK93rdF7ZqyZpS35oO9auil7ZqyZpS39oO9SuilJlCX36M0sWekH/eG1bjofUynLE6bMLFkDEfRlCXWo0UM28sbDK80sVYMR9MfqzpPNjnXpxsxSNRBBX+ryKisl+WSsmSUr/aCPoNwl6YfcozezhKUf9PWgfOzKTXYy1j16M0tU8kFfL1KjL3l4pZmlK/mgrxY9GetRN2aWqOSDvtCom5LH0ZtZupIP+noUDHr36M0sUckHfWPC1LG4dGNmKRuMoO/Soy+7dGNmCRuIoO90YfAGl27MLGXpB33E4atIdeKgN7OUpR/0RWv0Lt2YWaIGIuiLlG7qcWRJYzOzlAxE0BeZMAUwVnWv3szSk37QR4HSTf5F4KA3sxSlH/SFevTZ/tFq7Xg0yczsuHLQc6RHP+oevZklyEEPVPL16sc88sbMElQo6CVdJmmzpC2Srmuz/82S7pdUlXRly76apAfzn7X9anhRRda6aewfHXfQm1l6Kt0OkFQGbgQuBUaA9ZLWRsSmpsOeBD4EfKLNUxyMiPP60NYJKTKOfiiv0btHb2Yp6hr0wIXAlojYCiDpVuAK4HDQR8QT+b4ZlZT1COpBgR599ofN6LhPxppZeoqUbpYBTzXdH8m3FTVb0gZJ90h6T7sDJF2TH7Nh586dPTz1sdXzCVBFT8a6R29mKSoS9O1SspcppGdExBrgA8DnJL38RU8WcXNErImINUuWLOnhqY+tVjToy67Rm1m6igT9CLCi6f5yYHvRXxAR2/N/twJ3Aef30L5JqUXRHr1H3ZhZuooE/XpgtaRVkmYBVwGFRs9IWihpOL+9GHgjTbX9qdbo0Ze6LmrmCVNmlq6uQR8RVeBa4DbgEeCbEbFR0g2SLgeQ9HpJI8B7gZskbcwf/ipgg6QfA3cCn2kZrTOlGkFfZJli8BIIZpamIqNuiIh1wLqWbdc33V5PVtJpfdyPgNdOso0TdrhH70XNzGyAJT0ztvDJWC+BYGYJSzvoGydjC65e6aA3sxSlHfQFe/RlB72ZJSzpoC86YUoSlZJcozezJCUd9NWC4+gbx3h4pZmlKOmgr+cd9G41esA9ejNLVtJBX8uTvkiPvlIuuUZvZklKPOiLl27cozezVCUd9NVegr7sGr2ZpSnpoC+6BEJ2TMk9ejNL0kAEfeHSjVevNLMEJR30vZRuymV5PXozS1LSQX+kdNP9ZQ6VSu7Rm1mSBiLoC0+Yco/ezBKUdND3OurGPXozS1HSQV+r1xFQIOeplEqMjnt4pZmlJ/GgD8oloSJLILhHb2aJGoigL6LiGr2ZJSrpoK/2GvTu0ZtZgpIO+lo9Cs2KhWxRs7FqnciXNjYzS0XyQd9Ljx5wnd7MkpN00Gelm2Iv8XDQe70bM0tM0kHfS+mmXM7eCq9Jb2apST7oi5ZuhtyjN7NEJR301Xq9cNA3jnOP3sxSk3TQ93QyNi/duEdvZqlJPugLD6883KP3MghmlpZCQS/pMkmbJW2RdF2b/W+WdL+kqqQrW/ZdLenR/OfqfjW8iJ4mTJVdozezNHUNekll4EbgXcA5wPslndNy2JPAh4Cvtzz2VOBTwEXAhcCnJC2cfLOL6W0cvUfdmFmaivToLwS2RMTWiBgDbgWuaD4gIp6IiIeA1pT8FeD2iNgdEXuA24HL+tDuQiZSunGP3sxSUyTolwFPNd0fybcVUeixkq6RtEHShp07dxZ86u5qvUyYKnvUjZmlqUgKtusSF10QptBjI+LmiFgTEWuWLFlS8Km766VGX/bJWDNLVJGgHwFWNN1fDmwv+PyTeeyk9TZhysMrzSxNRYJ+PbBa0ipJs4CrgLUFn/824J2SFuYnYd+ZbzsuelsCwaUbM0tT16CPiCpwLVlAPwJ8MyI2SrpB0uUAkl4vaQR4L3CTpI35Y3cDf0j2ZbEeuCHfdlz0MjPWPXozS1WlyEERsQ5Y17Lt+qbb68nKMu0eewtwyyTaOCH1elCPYhcGBy+BYGbpSnZm7Hg9C+ziFx7x8EozS1OyQd8I7KI9+pKUXU7Qo27MLDHJBv14LRvFWTToAYYrJffozSw5CQd9o3RT/CXOqpRcozez5CQb9L2WbgCGK2X36M0sOekGfa33oJ9VKfni4GaWnGSD/kjppreg98lYM0tNskE/sdKNT8aaWXqSDfrxCZZufDLWzFKTbNCPVSc2vNJBb2apSTboJ1ajLzvozSw5yQa9a/RmZplkg37iNXqPujGztCQb9GMTmBnrHr2ZpSjdoJ9g6cY1ejNLTbJBP7FFzbwEgpmlJ+Ggn9jMWAe9maUm2aCfeOnGJ2PNLC3pBv1ERt2US9QDql7YzMwSkmzQT3R4Jfi6sWaWlmSDfqxap6TsEoFFDedB7zq9maUk2aAfr9V76s1DtgQCuEdvZmlJOOij56B3j97MUpRs0I9W6z3NioXmGr1H3phZOpIN+omUbuYMZaWbA2MOejNLh4O+yWmnDAPwi+cOTUWTzMymRdJB38usWICl8+cA8LSD3swSkmzQj1V779EvmjuLobLYvtdBb2bpKBT0ki6TtFnSFknXtdk/LOkb+f57Ja3Mt6+UdFDSg/nPn/W3+Z2NTWDUTakkTp8/mx37Dk5Rq8zMjr9KtwMklYEbgUuBEWC9pLURsanpsA8DeyLiLElXAZ8F3pfveywizutzu7sar/ZeugFYesocduxzj97M0lGkR38hsCUitkbEGHArcEXLMVcAX81vfxt4u9TDlNQpMDaBk7EASxe4R29maSkS9MuAp5ruj+Tb2h4TEVVgH7Ao37dK0gOSfiDpTe1+gaRrJG2QtGHnzp09vYBOJjLqBrITsk/vO0S9Hn1ph5nZdCsS9O3SsjUFOx2zAzgjIs4HPgZ8XdIpLzow4uaIWBMRa5YsWVKgSd2NTWDCFMDS+bMZrwXPvjDWl3aYmU23Ikk4Aqxour8c2N7pGEkVYD6wOyJGI+JZgIi4D3gMeMVkG13EhEs382cDuHxjZsnoejIWWA+slrQK2AZcBXyg5Zi1wNXA3wNXAndEREhaQhb4NUlnAquBrX1r/TFMpHTz9XufZNueLOC/sf4pfrLtOQA+cNEZfW+fmdnx0jXoI6Iq6VrgNqAM3BIRGyXdAGyIiLXAl4E/l7QF2E32ZQDwZuAGSVWgBnwkInZPxQtpNV7tfXglwPyThgDYd3C8300yM5sWRXr0RMQ6YF3Ltuubbh8C3tvmcd8BvjPJNk7I2ARmxgLMnVWmXJKD3sySkezM2PEJzIwFkMT8OUMOejNLRrJBP9GTsYCD3sySkmTQR8SESzfgoDeztCQZ9LV6ENHbhcGbzZ8zxHMHx6mHJ02Z2YkvyaAfr2UBXZ7AhCnIgr4e8PxotZ/NMjObFkkG/Vgtu+brZEo3APsOuHxjZie+NIM+v7j3ZEo34LH0ZpaGJIN+vDa5oD/FQW9mCUk66Cdaupk7q0ylJPYe8MJmZnbiSzLoJ1u6kcTKxXN5aNs+qvmXhpnZiSrNoJ9k6QbgTWctZv+hKj8e2duvZpmZTYskg/7I8MqJB/1Zp81j6fzZ3P3oLl+ExMxOaEkGfaN0M5ELjzRI4pKzFvPM/lHu+tkz/Wqamdlxl2TQT3bUTcO5yxcwf84QN/3guCyhb2Y2JZIM+n7U6BuPf/3Khdz7+G6PwDGzE1aaQV+d3PDKZmecOheAh7ftm/RzmZlNhySDvl+lG4BlC+YA8NCIg97MTkwO+i7mzCqzctFJPORhlmZ2gkoy6PtZuoHspOzD7tGb2QkqzaDvwzj6Zucun8/2fYfYuX+0L89nZnY8JRn045NcAqHVa5fNB+DhbS7fmNmJpzLdDZgKRxY168/32KbtzyHga/c8ydP7Rtl/aJx6wG+99eV9eX4zs6mUZNBPdlGzVsNDZRafPMy2vQc5NF7ji3c9hgS/8aZVDJWT/KPIzBKSZEo1evR9ynkAli+Yw7Y9B/nrh3aw7+A4ew+Ms+7hHf37BWZmUyTJoB+rBbPKJaT+Jf2yhXPYP1rl/if38JZXLGHJvGFu+sFWwhcQN7MZLs2gr9aZVenvS1ueT5xaOn82b3/Vabxp9WI27XiOH255tq+/x8ys35Ks0Y/X6gyV+1i3AZYtPIlfevkiLlx1KpVSifNWLODuLbv44l1bkODxXS9Qkjh17izOXDKXV7zk5L7+fjOziUo46Pvboy+XxLvPfenh+5VyifNXLOB7m37Bjx57ca/+vBUL+OcXv4w3vHwRS+fP7msZycysF4WCXtJlwJ8AZeBLEfGZlv3DwP8ELgCeBd4XEU/k+34f+DBQAz4aEbf1rfUtNm7fxzlLT5mS0k07l5y1mAUnDXHy7CEWzxsG4IXRKk88+wL3bH2Wj3/rxwAMV0q89ewl/N6lr+CVp58CwIGxKgfHapRL4qRZlePSXjMbTF2DXlIZuBG4FBgB1ktaGxGbmg77MLAnIs6SdBXwWeB9ks4BrgJeDbwU+L+SXhERtX6/kKd2H+DyL/yQV55+MrV6djJ2qlXKJc5bsfCobfPnDPHSBXO4+MxFjOw+wPZ9h/jFc4f40WPP8r1Nd3PJWYvZse8Qj+18nsZ53FmVEhetOpWLz1zEs8+P8dOnn2OsWmfpgjksWzCHs06bx1mnzWPX/lEe3raP/YeqnHfGAs5fsYD5Jw1RKYld+8d49Jn97Nw/ylmnzeNVS09hzlCZA+O1w18qB8ay2wfGaswdrrBy0VwWnjRU6K+Nej147tA4+w9VAZCyi7OocRvl27Lbw0Ml5s2qUGoa+hQR1ANq9ch+IqjVsn+r9Tr1Okf92zimWgvq+Zu1eN4wp508TKVcolqrc3C8xsHxGqPjdeYOV5g/Z6hvw2rNptILo1UefGovG7fvY8XCk7hg5UJOO3n2lPyuIj36C4EtEbEVQNKtwBVAc9BfAXw6v/1t4AvK0uMK4NaIGAUel7Qlf76/70/zjzh9/mw+++vn8oU7HuWJZw/wytOnt0Zekjhj0VzOWJQtc3zpOS/h7362i43b97Hk5GHedvZpzJ1VJoA9L4yx+en93P3oLobK4iWnzGaoXGLrrhfYd2CcWtPInpKyL4Zbfvh4X9o5XClRkqhHEGRhHMHh2+WSqJRKjFZr9HpFRQlmV8rUIqjXg2qfLslYUlZKa1wysvV3zhkqv2h7p8FR2avuXfYV12HfBL5njjV4a6JtbOhXW3t5D3sZjNbahtb2tmtj4/kbv7vxmc03tm3X4edt+kc6ep/UeK4j/w9out/aJiFanvZwZ6fb6zw0/uL/U284cxF/cc3FL37Bk1Qk6JcBTzXdHwEu6nRMRFQl7QMW5dvvaXnsstZfIOka4Jr87vOSNhdq/TH8HLgNFgO7Jvtcx9uW4/NrTsj35jjw+9KZ35vO+vLe/By49Tcn/PCXddpRJOjbfee3fl93OqbIY4mIm4GbC7SlJ5I2RMSafj9vCvzetOf3pTO/N53N9PemSCF7BFjRdH85sL3TMZIqwHxgd8HHmpnZFCoS9OuB1ZJWSZpFdnJ1bcsxa4Gr89tXAndENmV0LXCVpGFJq4DVwD/0p+lmZlZE19JNXnO/FriNbHjlLRGxUdINwIaIWAt8Gfjz/GTrbrIvA/Ljvkl24rYK/PZUjLg5hr6XgxLi96Y9vy+d+b3pbEa/N/JaLWZmafMsHTOzxDnozcwSl2TQS7pM0mZJWyRdN93tmU6SVki6U9IjkjZK+p18+6mSbpf0aP7vwm7PlSpJZUkPSPrr/P4qSffm78038kEIA0fSAknflvTT/PPzBn9uQNLv5f+XfiLpLyTNnumfmeSCvmnJhncB5wDvz5diGFRV4OMR8SrgYuC38/fjOuD7EbEa+H5+f1D9DvBI0/3PAv81f2/2kC3xMYj+BPjbiHgl8I/I3qOB/txIWgZ8FFgTEa8hG6DSWPZlxn5mkgt6mpZsiIgxoLFkw0CKiB0RcX9+ez/Zf9ZlZO/JV/PDvgq8Z3paOL0kLQd+DfhSfl/AL5Mt5QED+t5IOgV4M9mIOiJiLCL24s8NZKMV5+Rzhk4CdjDDPzMpBn27JRtetOzCIJK0EjgfuBd4SUTsgOzLADht+lo2rT4H/Gugnt9fBOyNiGp+f1A/P2cCO4H/kZe1viRpLgP+uYmIbcB/Ap4kC/h9wH3M8M9MikFfaNmFQSNpHvAd4Hcj4rnpbs9MIOndwDMRcV/z5jaHDuLnpwK8DvhvEXE+8AIDVqZpJz8ncQWwimxF3rlkZeJWM+ozk2LQe9mFFpKGyEL+axHx3XzzLyQtzfcvBZ6ZrvZNozcCl0t6gqzE98tkPfwF+Z/lMLifnxFgJCLuze9/myz4B/1z8w7g8YjYGRHjwHeBX2KGf2ZSDPoiSzYMjLzm/GXgkYj4L027mpetuBr4q+PdtukWEb8fEcsjYiXZ5+SOiPhnwJ1kS3nA4L43TwNPSTo73/R2shnug/65eRK4WNJJ+f+txvsyoz8zSc6MlfSrZD2zxpIN/3GamzRtJF0C3A08zJE69L8lq9N/EziD7MP73ojYPS2NnAEkvRX4RES8W9KZZD38U4EHgA/m11QYKJLOIztJPQvYCvxLss7hQH9uJP174H1kI9oeAH6DrCY/Yz8zSQa9mZkdkWLpxszMmjjozcwS56A3M0ucg97MLHEOejOzxDnobcaT9Ml8tcCHJD0o6aJjHPsVSVd22n+Mx31I0he6HLNSUkj6w6ZtiyWNNx4r6SOS/kVrWyTdJWlNfnudpAW9ttFsorpeStBsOkl6A/Bu4HURMSppMdm47umyNW/PH+T33wtsbOyMiD/r9gQR8atT0zSz9tyjt5luKbCrMfkkInZFxHZJ10tan68JfnM+S/Eoki6Q9ANJ90m6rWnq/kclbcr/Qri1zeO+Iunzkn4kaWvLXwgHgUcavXOyiTPfbHrspyV94lgvSNIT+RcWkj6Wv4afSPrdfNvKfP33/57/JfM9SXOKtN2sHQe9zXTfA1ZI+pmkL0p6S779CxHx+nxN8DlkvezD8vV9/hS4MiIuAG4BGjOkrwPOj4hzgY90+L1LgUvy5/1My75bgavyJY5rTHBdE0kXkM02vYjsWgH/StL5+e7VwI0R8WpgL/DrPbTd7CgOepvRIuJ54ALgGrJlc78h6UPA2/Ir+jxMthjZq1seejbwGuB2SQ8C/45ssSmAh4CvSfog2TT2dv4yIuoRsQl4Scu+vwUuBd4PfGMSL+8S4H9HxAv56/wu8KZ83+MR8WB++z5gZQ9tNzuKa/Q240VEDbgLuCsP9t8EziW7ys9Tkj4NzG55mICNEfGGNk/5a2QX1bgc+ANJrV8SAM3rlBxVFoqIMUn3AR8n+4L5xz2/qDbPe4zfXyP7qwXatL1pHXSzttyjtxlN0tmSVjdtOg/YnN/ela+z326UzWZgSX4yF0lDkl4tqQSsiIg7yS44sgCYN4Gm/Wfg30TEsxN4bMPfAe/JV0KcC/wTsgXo2upj223AuEdvM9084E/z4YhVYAtZGWcv2YqcT5AtTX2UvNd9JfB5SfPJPuufA34G/K98m8iu87m3zbncY4qIjTSNtpmIiLhf0leAf8g3fSkiHlB2JbB2yrRp+2TaYIPBq1eamSXOpRszs8Q56M3MEuegNzNLnIPezCxxDnozs8Q56M3MEuegNzNL3P8H+viwUcR1mKUAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.distplot(train['SalesInMillions'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def k_fold_cross_valid(model,x_train,y_train,n_splits=5):\n",
    "    \n",
    "    X = x_train.copy()\n",
    "    y = y_train.copy()\n",
    "\n",
    "    from sklearn.model_selection import KFold\n",
    "    kf = KFold(n_splits=5)\n",
    "    kf.get_n_splits(X)\n",
    "    res = []\n",
    "\n",
    "    for train_index, test_index in kf.split(X):\n",
    "        \n",
    "        X_train, X_test = X.iloc[train_index], X.iloc[test_index]\n",
    "        y_train, y_test = y.iloc[train_index], y.iloc[test_index]\n",
    "\n",
    "        model.fit(X_train,y_train)\n",
    "        y_pred = model.predict(X_test)\n",
    "    \n",
    "        res.append(metric(y_test,y_pred))\n",
    "        \n",
    "    print(\"RMSE:\",np.array(res).mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_importance(model,features):\n",
    "   \n",
    "    model.fit(train[features],train[target])\n",
    "    importances = model.feature_importances_\n",
    "    indices = np.argsort(importances)\n",
    "\n",
    "    plt.title('Feature Importances')\n",
    "    plt.barh(range(len(indices)), importances[indices], color='g', align='center')\n",
    "    plt.yticks(range(len(indices)), [features[i] for i in indices])\n",
    "    plt.xlabel('Relative Importance')\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ID',\n",
       " 'CRITICS_POINTS',\n",
       " 'YEAR',\n",
       " 'USER_POINTS',\n",
       " 'CONSOLE',\n",
       " 'RATING',\n",
       " 'CATEGORY',\n",
       " 'PUBLISHER']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cat_feat = ['CONSOLE','CATEGORY', 'PUBLISHER', 'RATING']\n",
    "features = list(set(train.columns)-set(['SalesInMillions']))\n",
    "target = 'SalesInMillions'\n",
    "features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSE: 1.9461135494000286\n"
     ]
    }
   ],
   "source": [
    "model = cat.CatBoostRegressor(random_state=100,cat_features=cat_feat,verbose=0)\n",
    "k_fold_cross_valid(model,train[features],train[target],n_splits=5)\n",
    "model.fit(train[features],train[target])\n",
    "y_pred = model.predict(test[features])\n",
    "sub = pd.DataFrame(y_pred,columns=[target])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbQAAAEWCAYAAAAO4GKjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3deZRdVZ328e/DIKMCYU6wCdCIbWgMJCivjQqNA2gr8CpToxjtFmixUbtBEHlXIg4giNAgSxsUAWWUqQFRQIYWhRYqUAQCRAkJMqQZFQmEweR5/zi7XIdLDbcqlbqVU89nrbvq3H3O2fu374X6ZZ9zam/ZJiIiYnm3QqcDiIiIGA5JaBER0QhJaBER0QhJaBER0QhJaBER0QhJaBER0QhJaBER0QhJaBH9kDRf0iJJC2uv8UtZ506SHhmuGNts8yxJXxvJNvsiaYakH3c6jmieJLSIgX3I9pq112OdDEbSSp1sf2ksz7HH6JeEFjFEknaQdIukP0q6S9JOtX2flHSfpOckPSjpoFK+BvAzYHx9xNc6gmodxZWR4hGSZgHPS1qpnHeJpCclzZN0aJtxT5TkEuPDkv4g6WBJ20uaVfrzndrx0yT9WtKpkp6VdL+kXWr7x0u6QtIzkh6Q9OnavhmSLpb0Y0l/Ag4GjgL2KX2/q7/Pq/5ZSPp3SU9IWiDpk7X9q0k6UdJDJb5fSVqtje9oWmnrufL57d/O5xejV/61FDEEkiYAPwU+Dvwc2AW4RNKbbT8JPAH8A/Ag8C7gZ5Jut32HpN2AH9vepFZfO83uB3wQeApYAlwJ/Fcp3wT4haQ5tq9psxtvB7Ys8V1R+vEeYGXgTkk/sf3ftWMvBtYD/i9wqaTNbD8DnA/MBsYDbwauk/Sg7evLubsDewEHAKuUOv7a9sdqsfT5eZX9GwFrAROA9wIXS7rc9h+AbwGTgHcA/1tiXdLfdwS8AJwCbG97jqSNgXFtfm4xSmWEFjGwy8u/8P8o6fJS9jHgattX215i+zqgC/gAgO2f2p7ryn8D1wLvXMo4TrH9sO1FwPbA+raPsf2y7QeBM4B9B1HfV22/aPta4HngfNtP2H4UuBnYtnbsE8DJtl+xfSEwB/igpDcCOwJHlLq6ge9TJZEet9q+vHxOi3oLpI3P6xXgmNL+1cBCYCtJKwCfAj5n+1Hbi23fYvslBviOqP5RsLWk1WwvsD17EJ9djEJJaBED28P22uW1RynbFNirluj+SPWLfWMASbtJ+p9yGe6PVL9E11vKOB6ubW9Kddmy3v5RwIaDqO/x2vaiXt6vWXv/qF89k/lDVCOy8cAztp9r2Tehj7h71cbn9bTtP9fev1DiWw9YFZjbS7V9fke2nwf2oboEukDST8vILZZjSWgRQ/Mw8KNaolvb9hq2j5O0CnAJ1aWwDW2vDVwN9FxX7G2Ji+eB1WvvN+rlmPp5DwPzWtp/ve0P9HLecJigV18X/SvgsfIaJ+n1Lfse7SPu17xv4/Pqz1PAi8AWvezr8zsCsH2N7fdS/SPkfqoRbizHktAihubHwIckvV/SipJWLQ8vbAK8jupe0ZPAn8s9s/fVzn0cWFfSWrWybuADksZJ2gj4/ADt3wb8qTwoslqJYWtJ2w9bD19tA+BQSStL2gv4G6rLeQ8DtwDHls9gG+CfgHP7qetxYGK5XAgDf159sr0EOBP4dnk4ZUVJ/6ckyT6/I0kbSvqwqod0XqK6hLl4kJ9JjDJJaBFDUH6R7051me9JqtHA4cAK5fLbocBFwB+Af6R66KLn3PupHqR4sFwKGw/8CLgLmE91/+jCAdpfDHwImAzMoxqpfJ/qwYll4TdUD5A8BXwd+Kjtp8u+/YCJVKO1y4Dp5X5VX35Sfj4t6Y6BPq82HAbcDdwOPAN8k+p76PM7Kq9/LzE/A7wb+Mwg2oxRSFngMyL6I2ka8M+2d+x0LBH9yQgtIiIaIQktIiIaIZccIyKiETJCi4iIRsjUVx2y3nrreeLEiZ0OIyJiuTJz5synbK/f274ktA6ZOHEiXV1dnQ4jImK5IumhvvblkmNERDRCElpERDRCElpERDRCElpERDRCElpERDRCElpERDRCElpERDRCElpERDRC5nLsEI2XOajTUUREjCxPX7qcI2mm7am97csILSIiGiEJLSIiGiEJLSIiGiEJLSIiGiEJLSIiGiEJLSIiGiEJLSIiGmHYE5qkiZLuaSmbIekwSTtI+o2kbkn3SZpR9k+T9GQp73m9pdS1qLy/V9I5klbup+2dJD0r6c5S//Tavh0l3Sbp/vI6sDW+sn2WpEclrVLerydpvqS/rcX2jKR5ZfsXklaQdIqkeyTdLel2SZsN80cbERH9GOkVq88G9rZ9l6QVga1q+y60/dn6wZImAnNtTy7HXwfsDZzbTxs32/4HSWsA3ZKuAh4FzgP2sH2HpPWAayQ9avunvdSxGPgU8N2eAtt3A5NLXGcBV9m+uLzfDxgPbGN7iaRNgOfb+0giImI4jPQlxw2ABQC2F9u+t90TbS8GbgMmtHn888BMYAvgEOAs23eUfU8BXwSO7OP0k4EvSGo34W8MLLC9pNT/iO0/tB4k6UBJXZK6eKHNmiMioi0jndBOAuZIukzSQZJWre3bp+WS42r1E8uxbwd+3k5DktYFdgBmA5OokltdVynvze+BXwEfb6ct4CLgQyXuEyVt29tBtk+3PdX2VFZvs+aIiGjLskhofU3UZdvHAFOBa4F/5NXJ6ULbk2uvRaV8C0ndwNPA723PGqD9d0q6s7RxnO3ZgPqIq79Jxb4BHE4bn5HtR6gun34JWAJcL2mXgc6LiIjhsyzuoT0NrNNSNg6YB2B7LvBdSWcAT5aRVH967qFtDNwk6cO2r+jn+Jtt/0NL2WyqRFo/bwrQ5yVP2w+URLr3APH1HP8S8DPgZ5IeB/YArm/n3IiIWHrDPkKzvRBY0DNCkTQO2BX4laQPSlI5dEuqhy/+2Ga9C6jueX1pCGGdBkyT1PNQx7rAN4HjBzjv68BhA1UuaTtJ48v2CsA2wENDiDMiIoZoWd1DOwA4uoxwbgC+UkZmH6e6h9YN/AjYvzzsAa+9h/aOXuq9HFhd0jsHE0xJhh8DzpB0P3ALcKbtKwc4bzZwRxtNbABcWf5cYRbwZ+A7g4kxIiKWTtZD65CshxYRY1HWQ4uIiBjASP9h9bCQ9H6qe2B182zv2Yl4IiKi85bLhGb7GuCaTscRERGjRy45RkREIyyXI7QmmDJ+Cl3TuzodRkREY2SEFhERjZCEFhERjZCEFhERjZCEFhERjZCZQjokM4VEDI+lnXkili+ZKSQiIhovCS0iIhohCS0iIhohCS0iIhohCS0iIhohCS0iIhphuUhokjaSdIGkuZLulXS1pDdJmiTpBkm/lfQ7Sf9Pkso50yQtkbRNrZ57JE0s25+SdLekWaV891IuSUeX+n4r6UZJk2p1zJe0Xkt80yQ92bLi9ltG4rOJiIjKqJ+cuCSoy4Czbe9byiYDGwJnAf9i+1pJqwOXAJ8BTiunPwJ8Gdinpc5NSvl2tp+VtCawftl9CPAO4K22X5D0PuAKSZNsv9hPqBfa/uzS9zgiIoZieRih7Qy8Yvt7PQW2u4E3Ab+2fW0pewH4LHBk7dyrgEmStmqpcwPgOWBhOXeh7Xll3xHAv5b6KPXfAuw/3B2LiIjhszwktK2Bmb2UT2ottz0XWFPSG0rREuB44KiWc+8CHgfmSfqhpA8BlPPWKPXUdZX2+rNPyyXH1VoPkHSgpC5JXbwwQG0RETEoy0NC64uAvua8qZefB+wgabO/7LQXA7sCHwV+C5wkacYQ2+pxoe3Jtdei1wRln257qu2prD5AbRERMSjLQ0KbDUzpo/xV83lJ2hxYaPu5njLbfwZOpLqUSK3ctm+zfSywL/AR238Cni/11G0H3LvUPYmIiGVmeUhoNwCrSPp0T4Gk7YHfATtKek8pWw04heoSY6uzgPdQHvyQNF7SdrX9k4GHyvYJwCk9lwxL/TtSjfQiImKUGvVPOdq2pD2BkyUdCbwIzAc+D+wOnCrpNGBF4EfAd3qp42VJpwD/UYpWBr4laXyp70ng4LLvVGAd4G5Ji4H/BXZvuYQ4S9KSsn0RMIvqHtqOtWM+Y/uWpet9RES0K8vHdEiWj4kYHlk+ZmzJ8jEREdF4SWgREdEISWgREdEISWgREdEIo/4px6aaMn4KXdO7Oh1GRERjZIQWERGNkIQWERGNkIQWERGNkIQWERGNkJlCOiQzhcRokFk2YnmTmUIiIqLxktAiIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRktCGSNLC8nOipEWS7pR0n6TbJH2i0/FFRIw1mZx4eMy1vS2ApM2BSyWtYPuHHY4rImLMyAhtmNl+EPg34NBOxxIRMZZkhLZs3AG8ubVQ0oHAgQCsNcIRRUQ0XEZoy4Z6K7R9uu2ptqey+kiHFBHRbEloy8a2wH2dDiIiYixJQhtmkiYC3wJO7WwkERFjS+6hDY8tJN0JrAo8B5yaJxwjIkZWEtoQ2V6z/JwPrNbZaCIiIpccIyKiEZLQIiKiEZLQIiKiEZLQIiKiEfJQSIdMGT+FruldnQ4jIqIxMkKLiIhGSEKLiIhGSEKLiIhGSEKLiIhGkO1OxzAmabzMQZ2OIkYTT8//ixEDkTTT9tTe9mWEFhERjZCEFhERjZCEFhERjZCEFhERjZCEFhERjZCEFhERjTAqEpqkxZK6Jd0j6SeSVpc0UdI9LcfNkHRY2T5L0rxy3v2SpteOu0nS1JZzd5J0VdneUNJVku6SdK+kq0v5YNrslnRLKZ8m6claLF9YFp9TRET0bVQkNGCR7cm2twZeBg5u87zDbU8GJgOfkLRZm+cdA1xn+6223wIcOYhYDy+xTrb9jlr5hSWWvwO+LOmNg6gzIiKW0mhJaHU3A389yHNWLT+fb/P4jYFHet7YnjXI9vpk+2nggdJGRESMkFGV0CStBOwG3N3mKSdI6qZKThfYfqLN804DfiDpRklfljS+tm+L2iXFbl47Wjyhtv/cXvrwV1QJ9jVJUtKBkrokdfFCm5FGRERbRst6aKuV5AHVCO0H9D3Cqc8PdLjtiyWtCVwv6R22bxmoMdvXSNoc2JUqgd4paeuye265dAhU99BaTj/c9sW9VLuPpJ2BrYBP236xl3ZPB06HMvVVREQMm9EyQltUuy/1r7ZfBp4G1mk5bhzwVOvJthcCNwE7ttug7Wdsn2f748DtwLuGHH3lQtuTgHcCJ0raaCnri4iIQRgtCe01SpJaIGkXAEnjqEZUv2o9tlyqfDswt526Jf29pNXL9uuBLYDfD1PctwI/Aj43HPVFRER7Rm1CKw4Aji6XI28AvmK7nrR67qHNorrvdmlt308lPVJeP2mpdwrQJWkWcCvwfdu3txlT/R5at6TX9XLMN4FPlmQZEREjIMvHdEiWj4lWWT4mYmBZPiYiIhovCS0iIhohCS0iIhohCS0iIhohCS0iIhphtMwUMuZMGT+FruldnQ4jIqIxMkKLiIhGSEKLiIhGSEKLiIhGSEKLiIhGyNRXHZKpr5onU1dFLHuZ+ioiIhovCS0iIhohCS0iIhohCS0iIhohCS0iIhohCS0iIhph1CQ0SRtJukDSXEn3Srpa0pvKvi9IelHSWuX9+yV1l9dCSXPK9jmSdpL0bG1/t6T3lPM2lHSepAclzZR0q6Q9azHsKOk2SfeX14G1fTMkPVrqu1fSfqX8QEkX1o57Q+nDZiP12UVExChJaJIEXAbcZHsL228BjgI2LIfsB9wO7Alg+xrbk21PBrqA/cv7A8rxN/fsL69flDYuB35pe3PbU4B9gU1KDBsB5wEH234zsCNwkKQP1kI9qbS5O/CfklYGzgA26UmawDHAmbbnDfsHFRERfRoVCQ3YGXjF9vd6Cmx3275Z0hbAmsDRVIltqP4eeLmljYdsn1reHgKcZfuOsu8p4IvAka0V2f4d8AKwjqu/TP8X4GRJU4FdgBOWIs6IiBiC0bJ8zNbAzD727QecD9wMbCVpA9tPDFDfOyV1195/BJgE3NHPOZOAs1vKukr5q0jaDvhdTxy2Z0m6Brge2MP2y701UC5hVpcx1xqgBxERMSijZYTWn32BC2wvAS4F9mrjnNZLjnNbD5B0mqS7JN3eUwT0NndRvewLkuYAvwFmtBx3GvCo7Rv7Csr26ban2p7K6m30IiIi2jZaEtpsYEproaRtgC2B6yTNp0puQ73sOBvYrueN7UOoLg+uX9vfOj/YFODe2vuTbG8F7AOcI2nV2r4l5RURER0wWhLaDcAqkj7dUyBpe+A/gBm2J5bXeGCCpE2H2Maqkv6lVlYfJ50GTJM0ubS/LvBN4PjWimxfSnU58hNDiCMiIpaBUZHQyoMVewLvLY+8z6a6pLcT1dOPdZdRjdT6886Wx/Y/WtrYA3i3pHmSbqO6Z3ZEiWEB8DHgDEn3A7dQPa14ZR9tHAP8m6RR8RlGRIx1WT6mQ7J8TPNk+ZiIZS/Lx0REROMloUVERCMkoUVERCMkoUVERCOMlplCxpwp46fQNb2r02FERDRGRmgREdEISWgREdEISWgREdEISWgREdEImSmkQzJTSHNkhpCIkZOZQiIiovGS0CIiohGS0CIiohGS0CIiohGS0CIiohGS0CIiohEak9AkLS6rU98j6UpJa7fs/4KkFyWtVd6/v7ai9UJJc8r2OZJ2knRVOW6apCWStqnVdY+kiWV7TUnfLStt3ylppqRPj1zPIyICGpTQgEW2J9veGngGOKRl/37A7cCeALavKcdPBrqA/cv7A3qp+xHgy320+33gD8CWtrcFdgXGLX13IiJiMJqU0OpuBSb0vJG0BbAmcDRVYhusq4BJkraqF5Z63wYcbXsJgO0nbX9zqIFHRMTQNC6hSVoR2AW4ola8H3A+cDOwlaQNBlntEuB44KiW8knAXT3JrI3YDpTUJamLFwYZQURE9KtJCW01Sd3A01SX/K6r7dsXuKAknkuBvYZQ/3nADpI26+sASV8u9+Ee622/7dNtT7U9ldWHEEFERPSpSQltUbkftinwOso9tPIwx5bAdZLmUyW3QV92tP1n4ETgiFrxvcBbJa1Qjvl6ieENS9GPiIgYgiYlNABsPwscChwmaWWq5DXD9sTyGg9MkLTpEKo/C3gPsH5p6wGqB0q+Vi51ImlVQEvfk4iIGIzGJTQA23cCd1GNxvYFLms55LJSPth6XwZOAer34P4ZWBd4QNJM4Be8ehQXEREjIMvHdEiWj2mOLB8TMXKyfExERDReElpERDRCElpERDRCElpERDTCSp0OYKyaMn4KXdO7Oh1GRERjZIQWERGNkIQWERGNkIQWERGNkIQWERGNkJlCOiQzhXRGZvWIWL5lppCIiGi8JLSIiGiEJLSIiGiEJLSIiGiEJLSIiGiEJLSIiGiEJLSIiGiEthKapI0kXSBprqR7JV0t6U2SFknqLmXnSFq5HL+TpKskfbLs75b0sqS7y/ZxkqZJ+k6tjQMk3SNpdqnvsFK+g6TflPPukzSjnzinSXqyFtOna/v2kDRL0v0ljj1q+86S9NGyfZOkrtq+qaXs/bW+LJQ0p2yfI2l1SeeWeu+R9CtJaw7ie4iIiKU04Gz7kgRcBpxte99SNhnYEJhre7KkFYHrgL2Bc3vOtf1D4IflnPnAzrafKu+n1drYDfg88D7bj0laFfh42X02sLftu0o7Ww0Q8oW2PytpA2C2pCuAjYBvAe+1PU/SZsB1kh60PauXOjaQtJvtn9X6cg1wTYn3JuAw213l/ZeAx23/bXm/FfDKAHFGRMQwameEtjPwiu3v9RTY7gYerr1fDNwGTBhiHF+iShCPlfpetH1G2bcBsKCnHdv3tlOh7SeAucCmwGHAN2zPK/vmAccCh/dx+gnA0YOIf2Pg0Vrbc2y/1HqQpAMldUnq4oVB1B4REQNqJ6FtDczs74Ayono78PMhxtFfGycBcyRdJumg0taAJG0ObA48AEzqpf6uUt6bW4GXJO3cTlvAmcARkm6V9DVJW/Z2kO3TbU+1PZXV26w5IiLasrQPhWwhqRt4Gvh9H5fvlortY4CpwLXAPzJw0tynxHQ+cJDtZwABrZP49VZW9zXaHKWVEevmVCO7ccDtkv6mnXMjImJ4tJPQZgNT+tg31/Zk4K+BHSR9eIhx9NcGtufa/i6wC/BWSev2U9eFtifbfrvty2r1t05muR3Q5+VL2zcAqwI7tNMB2wttX2r7M8CPgQ+0c15ERAyPdhLaDcAqLU8Mbk91bwoA2wuAI6nuhQ3FscDxkjYq9a8i6dCy/cHyYArAlsBi4I+DrP9bwJckTSx1TgSOAk4c4LyvA18cqHJJfydpnbL9OuAtwEODjDEiIpbCgE852rakPYGTJR0JvAjMp3oqse5yYIakdw42CNtXS9oQ+EVJXqa6LwXV044nSXoB+DOwf3kIZTD1d0s6Ariy/GnBK8AXy6XCgeJ6so0mtgC+W2JfAfgpcMlgYoyIiKWT9dA6JOuhdUbWQ4tYvmU9tIiIaLwBLzmORpI+CXyupfjXtg/pRDwREdF5y2VCq89AEhERAbnkGBERDbFcjtCaYMr4KXRN7xr4wIiIaEtGaBER0QhJaBER0QhJaBER0QhJaBER0QiZKaRDMlPI4GSGj4iAzBQSERFjQBJaREQ0QhJaREQ0QhJaREQ0QhJaREQ0QhJaREQ0wphMaKr8StJutbK9Jf1c0mJJ3bXXkbVj1pf0iqSDWuqbL+luSbMk/bekTUeyPxERMUYTmqs/vjsY+LakVSWtAXwdOARYZHty7XVc7dS9gP8B9uul2p1tbwPcBBy9bHsQERGtxmRCA7B9D3AlcAQwHTjH9twBTtsP+HdgE0kT+jjmVqCvfRERsYyM9eVjvgLcAbwM9Pzl+WqSumvHHGv7QklvBDayfZuki4B9gG/3UueuwOW9NSbpQOBAANYang5ERERlTCc0289LuhBYaPulUrzI9uReDt8XuKhsXwD8gFcntBslbQg8QR+XHG2fDpwOZeqriIgYNmP2kmPNkvIayH7ANEnzgSuAt0rasrZ/Z2BTYDZwzHAHGRER/UtCa4OkrYA1bE+wPdH2ROBYqlHbX9heBHweOEDSuJGPNCJi7EpCe63VWh7bP45qdHZZy3GX0MvTjrYXAOdTPTEZEREjJMvHdEiWjxmcLB8TEZDlYyIiYgxIQouIiEZIQouIiEZIQouIiEYY039Y3UlTxk+ha3pXp8OIiGiMjNAiIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRsnxMh0h6DpjT6ThG2HrAU50OogPGYr/T57GhE33e1Pb6ve3I1FedM6evNX2aSlLXWOszjM1+p89jw2jrcy45RkREIyShRUREIyShdc7pnQ6gA8Zin2Fs9jt9HhtGVZ/zUEhERDRCRmgREdEISWgREdEISWgdIGlXSXMkPSDpyE7HMxIkzZd0t6RuSY1dqlvSmZKekHRPrWycpOsk/a78XKeTMQ63Pvo8Q9Kj5fvulvSBTsY43CS9UdKNku6TNFvS50p5Y7/rfvo8ar7r3EMbYZJWBH4LvBd4BLgd2M/2vR0NbBmTNB+YarvRf3gq6V3AQuAc21uXsuOBZ2wfV/4Bs47tIzoZ53Dqo88zgIW2v9XJ2JYVSRsDG9u+Q9LrgZnAHsA0Gvpd99PnvRkl33VGaCPvbcADth+0/TJwAbB7h2OKYWL7l8AzLcW7A2eX7bOpfgk0Rh99bjTbC2zfUbafA+4DJtDg77qfPo8aSWgjbwLwcO39I4yy/yiWEQPXSpop6cBOBzPCNrS9AKpfCsAGHY5npHxW0qxySbIxl95aSZoIbAv8hjHyXbf0GUbJd52ENvLUS9lYuO77d7a3A3YDDimXqaK5vgtsAUwGFgAndjacZUPSmsAlwOdt/6nT8YyEXvo8ar7rJLSR9wjwxtr7TYDHOhTLiLH9WPn5BHAZ1aXXseLxcv+h5z7EEx2OZ5mz/bjtxbaXAGfQwO9b0spUv9jPtX1pKW70d91bn0fTd52ENvJuB7aUtJmk1wH7Ald0OKZlStIa5SYyktYA3gfc0/9ZjXIF8Imy/QngvzoYy4jo+aVe7EnDvm9JAn4A3Gf727Vdjf2u++rzaPqu85RjB5THWk8GVgTOtP31Doe0TEnanGpUBtUKD+c1tc+Szgd2olpW43FgOnA5cBHwV8Dvgb1sN+Yhij76vBPVJSgD84GDeu4tNYGkHYGbgbuBJaX4KKp7So38rvvp836Mku86CS0iIhohlxwjIqIRktAiIqIRktAiIqIRktAiIqIRktAiIqIRktAihpGkxWXG8XskXSlp7TbOWTjA/rUlfab2fryki4ch1on1GfJHgqTJTZt5P0aPJLSI4bXI9uQy6/wzwCHDUOfawF8Smu3HbH90GOodUZJWovp7pSS0WCaS0CKWnVupTTwt6XBJt5dJXL/SerCkNSVdL+mOsnZczyoMxwFblJHfCfWRlaTfSJpUq+MmSVPK7CxnlvburNXVK0nTJF1eRpXzJH1W0r+Vc/9H0rha/SdLuqWMQt9WyseV82eV47cp5TMknS7pWuAc4Bhgn9KXfSS9rdR1Z/m5VS2eSyX9XNXaYsfXYt21fEZ3Sbq+lA2qv9FQtvPKK69helGtCwXVLDA/AXYt798HnE41OfUKwFXAu1rOWQl4Q9leD3igHD8RuKfWxl/eA18AvlK2NwZ+W7a/AXysbK9NtQbfGi2x1uuZVtp7PbA+8CxwcNl3EtVEtAA3AWeU7XfVzj8VmF62/x7oLtszqNbNWq3WzndqMbwBWKlsvwe4pHbcg8BawKrAQ1RzoK5PtVrFZuW4ce32N6/mv1bqM9NFxFCsJqmbKlnMBK4r5e8rrzvL+zWBLYFf1s4V8I2yEsESqtHdhgO0d1FpYzrVQos/qbX3YUmHlferUk3HdF8/dd3oap2r5yQ9C1xZyu8Gtqkddz5U66BJekO5T7gj8JFSfoOkdSWtVY6/wvaiPtpcCzhb0pZUUyetXNt3ve1nASTdC2wKrAP80va80osvuYcAAAGgSURBVFbPtFJD6W80TBJaxPBaZHty+WV+FdU9tFOoktWxtv+zn3P3pxqBTLH9iqpVvlftrzHbj0p6ulzi2wc4qOwS8BHbcwYR+0u17SW190t49e+K1vnyTP/LIj3fT5tfpUqke5Y1tm7qI57FJQb10j4Mrb/RMLmHFrEMlJHFocBhZcmNa4BPlbWkkDRBUuvij2sBT5RktjPViATgOapLgX25APgisJbtu0vZNcC/lhnSkbTtcPSr2KfUuSPwbOnrL6kSMpJ2Ap5y7+uDtfZlLeDRsj2tjbZvBd4tabPS1rhSviz7G8uJJLSIZcT2ncBdwL62rwXOA26VdDdwMa9NUucCUyV1USWH+0s9TwO/Lg9hnNBLUxdTLUN0Ua3sq1SX72aVB0i+Onw94w+SbgG+B/xTKZtRYp9F9RDLJ/o490bgLT0PhQDHA8dK+jXVfcd+2X4SOBC4VNJdwIVl17LsbywnMtt+RLRN0k3AYba7Oh1LRKuM0CIiohEyQouIiEbICC0iIhohCS0iIhohCS0iIhohCS0iIhohCS0iIhrh/wPeZDDNR8HGEwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "model = cat.CatBoostRegressor(random_state=100,cat_features=cat_feat,verbose=0)\n",
    "plot_importance(model,features)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1 = sub['SalesInMillions'].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LGB Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Stratified_Kfold(model,train,target,test,features,folds):\n",
    "    max_iter = folds\n",
    "    folds = StratifiedKFold(n_splits = max_iter)\n",
    "    oofs = np.zeros(len(train))\n",
    "    test_preds = np.zeros(len(test))\n",
    "\n",
    "\n",
    "    for fold_, (trn_idx, val_idx) in enumerate(folds.split(train, pd.qcut(target, 10, labels=False, duplicates='drop'))):\n",
    "\n",
    "        print(f'\\n---- Fold {fold_} -----\\n')\n",
    "\n",
    "        X_trn, y_trn = train.iloc[trn_idx][features], target.iloc[trn_idx]\n",
    "        X_val, y_val = train.iloc[val_idx][features], target.iloc[val_idx]\n",
    "        X_test = test[features]\n",
    "        print(X_trn.shape[1], X_val.shape[1])\n",
    "\n",
    "\n",
    "        _ = model.fit(X_trn, y_trn, eval_set = [(X_val, y_val)], verbose=100, early_stopping_rounds=200, eval_metric='rmse')\n",
    "\n",
    "        oofs[val_idx] = model.predict(X_val)\n",
    "        current_test_pred = model.predict(X_test)\n",
    "        test_preds += model.predict(X_test)/max_iter\n",
    "\n",
    "        print(f'\\n Fold {metric(y_val, oofs[val_idx])}')\n",
    "\n",
    "\n",
    "    print(f'\\nOOF val score: {metric(target,oofs)}')\n",
    "    \n",
    "    return test_preds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "train = dtrain.copy()\n",
    "test = dtest.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "cat_feat = ['CONSOLE','YEAR','CATEGORY','PUBLISHER','RATING']\n",
    "\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "label_classes = {}\n",
    "\n",
    "data = pd.concat([train,test])\n",
    "\n",
    "for i in cat_feat:\n",
    "    le = LabelEncoder()\n",
    "    data[i] = le.fit_transform(data[i])\n",
    "    label_classes[i] = le\n",
    "    \n",
    "train = data[data['SalesInMillions'].notnull()]\n",
    "test = data[data['SalesInMillions'].isnull()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['CATEGORY',\n",
       " 'CONSOLE',\n",
       " 'CRITICS_POINTS',\n",
       " 'ID',\n",
       " 'PUBLISHER',\n",
       " 'RATING',\n",
       " 'USER_POINTS',\n",
       " 'YEAR']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "features = [c for c in train.columns if c not in ['SalesInMillions']]\n",
    "target = train['SalesInMillions']\n",
    "features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "---- Fold 0 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.50219\tvalid_0's l2: 2.25656\n",
      "[200]\tvalid_0's rmse: 1.52578\tvalid_0's l2: 2.32799\n",
      "Early stopping, best iteration is:\n",
      "[63]\tvalid_0's rmse: 1.48904\tvalid_0's l2: 2.21724\n",
      "\n",
      " Fold 1.4890386263356403\n",
      "\n",
      "---- Fold 1 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.75233\tvalid_0's l2: 3.07065\n",
      "[200]\tvalid_0's rmse: 1.71429\tvalid_0's l2: 2.9388\n",
      "[300]\tvalid_0's rmse: 1.71381\tvalid_0's l2: 2.93716\n",
      "[400]\tvalid_0's rmse: 1.71162\tvalid_0's l2: 2.92964\n",
      "[500]\tvalid_0's rmse: 1.72294\tvalid_0's l2: 2.96853\n",
      "[600]\tvalid_0's rmse: 1.72666\tvalid_0's l2: 2.98136\n",
      "Early stopping, best iteration is:\n",
      "[417]\tvalid_0's rmse: 1.70967\tvalid_0's l2: 2.92297\n",
      "\n",
      " Fold 1.7096706737706058\n",
      "\n",
      "---- Fold 2 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 2.00205\tvalid_0's l2: 4.00822\n",
      "[200]\tvalid_0's rmse: 2.00426\tvalid_0's l2: 4.01708\n",
      "Early stopping, best iteration is:\n",
      "[84]\tvalid_0's rmse: 1.99907\tvalid_0's l2: 3.99627\n",
      "\n",
      " Fold 1.9990676752414207\n",
      "\n",
      "---- Fold 3 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 4.47373\tvalid_0's l2: 20.0143\n",
      "[200]\tvalid_0's rmse: 4.3886\tvalid_0's l2: 19.2598\n",
      "[300]\tvalid_0's rmse: 4.33354\tvalid_0's l2: 18.7796\n",
      "[400]\tvalid_0's rmse: 4.30008\tvalid_0's l2: 18.4906\n",
      "[500]\tvalid_0's rmse: 4.2965\tvalid_0's l2: 18.4599\n",
      "[600]\tvalid_0's rmse: 4.30077\tvalid_0's l2: 18.4966\n",
      "[700]\tvalid_0's rmse: 4.27996\tvalid_0's l2: 18.318\n",
      "[800]\tvalid_0's rmse: 4.2656\tvalid_0's l2: 18.1953\n",
      "[900]\tvalid_0's rmse: 4.25491\tvalid_0's l2: 18.1043\n",
      "[1000]\tvalid_0's rmse: 4.24645\tvalid_0's l2: 18.0323\n",
      "Did not meet early stopping. Best iteration is:\n",
      "[988]\tvalid_0's rmse: 4.24554\tvalid_0's l2: 18.0246\n",
      "\n",
      " Fold 4.245543951641364\n",
      "\n",
      "---- Fold 4 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.82529\tvalid_0's l2: 3.33167\n",
      "[200]\tvalid_0's rmse: 1.71983\tvalid_0's l2: 2.95782\n",
      "[300]\tvalid_0's rmse: 1.65649\tvalid_0's l2: 2.74395\n",
      "[400]\tvalid_0's rmse: 1.64886\tvalid_0's l2: 2.71873\n",
      "[500]\tvalid_0's rmse: 1.63151\tvalid_0's l2: 2.66182\n",
      "[600]\tvalid_0's rmse: 1.62706\tvalid_0's l2: 2.64732\n",
      "[700]\tvalid_0's rmse: 1.61335\tvalid_0's l2: 2.60289\n",
      "[800]\tvalid_0's rmse: 1.60576\tvalid_0's l2: 2.57847\n",
      "[900]\tvalid_0's rmse: 1.60879\tvalid_0's l2: 2.58822\n",
      "[1000]\tvalid_0's rmse: 1.61073\tvalid_0's l2: 2.59446\n",
      "Did not meet early stopping. Best iteration is:\n",
      "[857]\tvalid_0's rmse: 1.60296\tvalid_0's l2: 2.56949\n",
      "\n",
      " Fold 1.602963796371373\n",
      "\n",
      "---- Fold 5 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.59477\tvalid_0's l2: 2.54328\n",
      "[200]\tvalid_0's rmse: 1.53477\tvalid_0's l2: 2.35553\n",
      "[300]\tvalid_0's rmse: 1.50697\tvalid_0's l2: 2.27095\n",
      "[400]\tvalid_0's rmse: 1.49894\tvalid_0's l2: 2.24683\n",
      "[500]\tvalid_0's rmse: 1.4879\tvalid_0's l2: 2.21384\n",
      "[600]\tvalid_0's rmse: 1.48301\tvalid_0's l2: 2.19932\n",
      "[700]\tvalid_0's rmse: 1.47341\tvalid_0's l2: 2.17093\n",
      "[800]\tvalid_0's rmse: 1.4707\tvalid_0's l2: 2.16296\n",
      "[900]\tvalid_0's rmse: 1.46474\tvalid_0's l2: 2.14547\n",
      "[1000]\tvalid_0's rmse: 1.46238\tvalid_0's l2: 2.13856\n",
      "Did not meet early stopping. Best iteration is:\n",
      "[955]\tvalid_0's rmse: 1.46003\tvalid_0's l2: 2.1317\n",
      "\n",
      " Fold 1.4600329274710402\n",
      "\n",
      "---- Fold 6 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.4716\tvalid_0's l2: 2.16561\n",
      "[200]\tvalid_0's rmse: 1.51222\tvalid_0's l2: 2.28682\n",
      "Early stopping, best iteration is:\n",
      "[44]\tvalid_0's rmse: 1.45967\tvalid_0's l2: 2.13064\n",
      "\n",
      " Fold 1.4596724214724013\n",
      "\n",
      "---- Fold 7 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 2.21631\tvalid_0's l2: 4.91202\n",
      "[200]\tvalid_0's rmse: 2.07687\tvalid_0's l2: 4.31339\n",
      "[300]\tvalid_0's rmse: 1.98087\tvalid_0's l2: 3.92387\n",
      "[400]\tvalid_0's rmse: 1.91959\tvalid_0's l2: 3.68481\n",
      "[500]\tvalid_0's rmse: 1.89552\tvalid_0's l2: 3.593\n",
      "[600]\tvalid_0's rmse: 1.87757\tvalid_0's l2: 3.52527\n",
      "[700]\tvalid_0's rmse: 1.86007\tvalid_0's l2: 3.45986\n",
      "[800]\tvalid_0's rmse: 1.84305\tvalid_0's l2: 3.39684\n",
      "[900]\tvalid_0's rmse: 1.8292\tvalid_0's l2: 3.34596\n",
      "[1000]\tvalid_0's rmse: 1.81092\tvalid_0's l2: 3.27943\n",
      "Did not meet early stopping. Best iteration is:\n",
      "[1000]\tvalid_0's rmse: 1.81092\tvalid_0's l2: 3.27943\n",
      "\n",
      " Fold 1.810918893918147\n",
      "\n",
      "---- Fold 8 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.86521\tvalid_0's l2: 3.47902\n",
      "[200]\tvalid_0's rmse: 1.84372\tvalid_0's l2: 3.39932\n",
      "[300]\tvalid_0's rmse: 1.82208\tvalid_0's l2: 3.31998\n",
      "[400]\tvalid_0's rmse: 1.80803\tvalid_0's l2: 3.26898\n",
      "[500]\tvalid_0's rmse: 1.80078\tvalid_0's l2: 3.24282\n",
      "[600]\tvalid_0's rmse: 1.79675\tvalid_0's l2: 3.2283\n",
      "[700]\tvalid_0's rmse: 1.79162\tvalid_0's l2: 3.20991\n",
      "[800]\tvalid_0's rmse: 1.78362\tvalid_0's l2: 3.18132\n",
      "[900]\tvalid_0's rmse: 1.78374\tvalid_0's l2: 3.18173\n",
      "[1000]\tvalid_0's rmse: 1.77152\tvalid_0's l2: 3.1383\n",
      "Did not meet early stopping. Best iteration is:\n",
      "[997]\tvalid_0's rmse: 1.77072\tvalid_0's l2: 3.13544\n",
      "\n",
      " Fold 1.7707178771067953\n",
      "\n",
      "---- Fold 9 -----\n",
      "\n",
      "8 8\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[100]\tvalid_0's rmse: 1.81351\tvalid_0's l2: 3.28882\n",
      "[200]\tvalid_0's rmse: 1.89422\tvalid_0's l2: 3.58806\n",
      "Early stopping, best iteration is:\n",
      "[78]\tvalid_0's rmse: 1.80744\tvalid_0's l2: 3.26683\n",
      "\n",
      " Fold 1.8074379001241507\n",
      "\n",
      "OOF val score: 2.0902343169894624\n"
     ]
    }
   ],
   "source": [
    "model = lgb.LGBMRegressor(n_estimators=1000,learning_rate=0.02)\n",
    "m2 = Stratified_Kfold(model,train,target,test,features,10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## XGB Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "---- Fold 0 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:1.95517\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.34363\n",
      "[200]\tvalidation_0-rmse:1.39873\n",
      "Stopping. Best iteration:\n",
      "[29]\tvalidation_0-rmse:1.29679\n",
      "\n",
      "\n",
      " Fold 1.2967934711667484\n",
      "\n",
      "---- Fold 1 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.35858\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.67357\n",
      "[200]\tvalidation_0-rmse:1.66851\n",
      "[300]\tvalidation_0-rmse:1.66537\n",
      "[400]\tvalidation_0-rmse:1.66290\n",
      "[500]\tvalidation_0-rmse:1.66575\n",
      "Stopping. Best iteration:\n",
      "[383]\tvalidation_0-rmse:1.65766\n",
      "\n",
      "\n",
      " Fold 1.6576557975737773\n",
      "\n",
      "---- Fold 2 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.61605\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.58020\n",
      "[200]\tvalidation_0-rmse:1.55394\n",
      "[300]\tvalidation_0-rmse:1.55896\n",
      "[400]\tvalidation_0-rmse:1.56273\n",
      "Stopping. Best iteration:\n",
      "[251]\tvalidation_0-rmse:1.55309\n",
      "\n",
      "\n",
      " Fold 1.5530895977917536\n",
      "\n",
      "---- Fold 3 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.20751\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:2.58378\n",
      "[200]\tvalidation_0-rmse:2.61591\n",
      "Stopping. Best iteration:\n",
      "[11]\tvalidation_0-rmse:1.91901\n",
      "\n",
      "\n",
      " Fold 1.9190087319598328\n",
      "\n",
      "---- Fold 4 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.25012\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.69266\n",
      "[200]\tvalidation_0-rmse:1.69477\n",
      "Stopping. Best iteration:\n",
      "[20]\tvalidation_0-rmse:1.61475\n",
      "\n",
      "\n",
      " Fold 1.6147484521877857\n",
      "\n",
      "---- Fold 5 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:3.22376\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:2.53839\n",
      "[200]\tvalidation_0-rmse:2.52852\n",
      "[300]\tvalidation_0-rmse:2.54048\n",
      "Stopping. Best iteration:\n",
      "[196]\tvalidation_0-rmse:2.52461\n",
      "\n",
      "\n",
      " Fold 2.5246128755152797\n",
      "\n",
      "---- Fold 6 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:6.85636\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:5.03136\n",
      "[200]\tvalidation_0-rmse:4.98907\n",
      "[300]\tvalidation_0-rmse:4.95954\n",
      "[400]\tvalidation_0-rmse:4.96576\n",
      "Stopping. Best iteration:\n",
      "[282]\tvalidation_0-rmse:4.95517\n",
      "\n",
      "\n",
      " Fold 4.955167440927568\n",
      "\n",
      "---- Fold 7 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.56469\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.45942\n",
      "[200]\tvalidation_0-rmse:1.45272\n",
      "[300]\tvalidation_0-rmse:1.46009\n",
      "Stopping. Best iteration:\n",
      "[170]\tvalidation_0-rmse:1.44151\n",
      "\n",
      "\n",
      " Fold 1.4415095492690617\n",
      "\n",
      "---- Fold 8 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.97995\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:2.20414\n",
      "[200]\tvalidation_0-rmse:2.17483\n",
      "[300]\tvalidation_0-rmse:2.14783\n",
      "[400]\tvalidation_0-rmse:2.13873\n",
      "[500]\tvalidation_0-rmse:2.14377\n",
      "[600]\tvalidation_0-rmse:2.15100\n",
      "Stopping. Best iteration:\n",
      "[406]\tvalidation_0-rmse:2.13646\n",
      "\n",
      "\n",
      " Fold 2.136456038350182\n",
      "\n",
      "---- Fold 9 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.20076\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.33773\n",
      "[200]\tvalidation_0-rmse:1.34970\n",
      "Stopping. Best iteration:\n",
      "[66]\tvalidation_0-rmse:1.32306\n",
      "\n",
      "\n",
      " Fold 1.3230582993402609\n",
      "\n",
      "---- Fold 10 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.97210\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.32812\n",
      "[200]\tvalidation_0-rmse:1.32177\n",
      "[300]\tvalidation_0-rmse:1.33425\n",
      "[400]\tvalidation_0-rmse:1.33981\n",
      "Stopping. Best iteration:\n",
      "[208]\tvalidation_0-rmse:1.32091\n",
      "\n",
      "\n",
      " Fold 1.3209105764834765\n",
      "\n",
      "---- Fold 11 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.21535\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.35032\n",
      "[200]\tvalidation_0-rmse:1.36645\n",
      "Stopping. Best iteration:\n",
      "[94]\tvalidation_0-rmse:1.34831\n",
      "\n",
      "\n",
      " Fold 1.3483073535453587\n",
      "\n",
      "---- Fold 12 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.20973\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:2.10549\n",
      "[200]\tvalidation_0-rmse:2.09835\n",
      "Stopping. Best iteration:\n",
      "[15]\tvalidation_0-rmse:1.92694\n",
      "\n",
      "\n",
      " Fold 1.9269364358085201\n",
      "\n",
      "---- Fold 13 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.04664\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.23026\n",
      "[200]\tvalidation_0-rmse:1.25839\n",
      "Stopping. Best iteration:\n",
      "[68]\tvalidation_0-rmse:1.21465\n",
      "\n",
      "\n",
      " Fold 1.2146456787883737\n",
      "\n",
      "---- Fold 14 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.08319\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.39087\n",
      "[200]\tvalidation_0-rmse:1.39459\n",
      "Stopping. Best iteration:\n",
      "[47]\tvalidation_0-rmse:1.36958\n",
      "\n",
      "\n",
      " Fold 1.3695776808442173\n",
      "\n",
      "---- Fold 15 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:3.51235\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:2.41666\n",
      "[200]\tvalidation_0-rmse:2.37168\n",
      "[300]\tvalidation_0-rmse:2.35236\n",
      "[400]\tvalidation_0-rmse:2.33420\n",
      "[500]\tvalidation_0-rmse:2.32416\n",
      "[600]\tvalidation_0-rmse:2.32493\n",
      "Stopping. Best iteration:\n",
      "[482]\tvalidation_0-rmse:2.32218\n",
      "\n",
      "\n",
      " Fold 2.322177317474424\n",
      "\n",
      "---- Fold 16 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.21889\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.31179\n",
      "[200]\tvalidation_0-rmse:1.31877\n",
      "[300]\tvalidation_0-rmse:1.34272\n",
      "Stopping. Best iteration:\n",
      "[142]\tvalidation_0-rmse:1.31033\n",
      "\n",
      "\n",
      " Fold 1.310330712127302\n",
      "\n",
      "---- Fold 17 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:3.29456\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:5.34037\n",
      "[200]\tvalidation_0-rmse:5.30966\n",
      "Stopping. Best iteration:\n",
      "[7]\tvalidation_0-rmse:3.05641\n",
      "\n",
      "\n",
      " Fold 3.056415176175524\n",
      "\n",
      "---- Fold 18 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.65771\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:2.10537\n",
      "[200]\tvalidation_0-rmse:2.08105\n",
      "[300]\tvalidation_0-rmse:2.10230\n",
      "[400]\tvalidation_0-rmse:2.11246\n",
      "Stopping. Best iteration:\n",
      "[215]\tvalidation_0-rmse:2.07495\n",
      "\n",
      "\n",
      " Fold 2.0749538640919547\n",
      "\n",
      "---- Fold 19 -----\n",
      "\n",
      "8 8\n",
      "[0]\tvalidation_0-rmse:2.37585\n",
      "Will train until validation_0-rmse hasn't improved in 200 rounds.\n",
      "[100]\tvalidation_0-rmse:1.55008\n",
      "[200]\tvalidation_0-rmse:1.53759\n",
      "[300]\tvalidation_0-rmse:1.54265\n",
      "Stopping. Best iteration:\n",
      "[178]\tvalidation_0-rmse:1.53475\n",
      "\n",
      "\n",
      " Fold 1.534752369271699\n",
      "\n",
      "OOF val score: 2.075506713163473\n"
     ]
    }
   ],
   "source": [
    "model = xgb.XGBRegressor(n_estimators=1000,learning_rate=0.05)\n",
    "m3 = Stratified_Kfold(model,train,target,test,features,20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = m1*0.4 + m2*0.3 + m3*0.3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    1503.000000\n",
       "mean        2.075646\n",
       "std         1.076925\n",
       "min         0.730431\n",
       "25%         1.613587\n",
       "50%         1.767763\n",
       "75%         2.105328\n",
       "max        14.495615\n",
       "Name: SalesInMillions, dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sub = pd.DataFrame({'SalesInMillions': y_pred})\n",
    "sub['SalesInMillions'] = np.clip(sub['SalesInMillions'], target.min(), target.max())\n",
    "sub['SalesInMillions'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "sub.to_excel(\"ensemble_3models.xlsx\",index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SalesInMillions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>1.626047</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2.244813</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>3.222870</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2.019227</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2.024462</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   SalesInMillions\n",
       "0         1.626047\n",
       "1         2.244813\n",
       "2         3.222870\n",
       "3         2.019227\n",
       "4         2.024462"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sub.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
